More

dolftax · on Nov 2, 2021

DeepSource | Bangalore & San Francisco | Fulltime | https://deepsource.io/

DeepSource is a fast and reliable static analysis platform for developers and engineering teams. We've various roles open across Platform Engineering, Language Engineering and Marketing - https://careers.deepsource.io/

dolftax · on April 20, 2020

"And we’re still accepting late applications if you’ve always wanted to do YC but couldn’t move out to the bay area" - Aaron from YC.

https://twitter.com/aaron_epstein/status/1252267533555470338...

foreign-inc · on April 20, 2020

How many companies from those late applications get accepted every batch?

gscott · on April 20, 2020

Now that ycombinator takes over a hundred startups it does seem more promising to get in, eventually.

dolftax · on April 13, 2020

> There's a better solution: use open-source cli tools that do just that!

We do not deny that you can't run the open-source tools locally. Be it one line command, or be it setting up pylint or flake8 with dedicated configurations. DeepSource is a tool meant to eliminate the need to set up all those open source tools locally or in your CI pipeline. So that you don't need to

- Fish for issues amongst hundreds of lines of logs in the CI

- Figure out and update linter config to remove duplicates and false positives (for ex: Bandit throws errors like `assets statement used` in a test file — which is a false-positive. Bandit doesn’t know that it is a test file by default)

- Some issues needed better description of why is that an issue, for ex: why should default file permissions be 0600? Justification on why is it necessary,.

- By default on every commit or pull request, linters run on all the files.

- If there are issues that occur in say 50 places, one have to manually fix it.

> 1. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8) that has bigger amount of checks: https://github.com/wemake-services/wemake-python-styleguide There's also `pylint` with a set of awesome checks as well.

Our focus at the moment is not on style issues. In fact, amongst the categories of issues we raise (anti-patterns, bug-risks, performance, security, style, documentation), style issues are the most debated on by our users as it is really subjective. We’re thinking of removing style issues by default (as an opt-in) and are working on running formatters like `black`, `yapf`, .. with a single line config in `.deepsource.toml`. Our analyzer team actively adds custom rules which you don’t get from the open-source tools. The following issues for example:

- Raising another exception when `assert` fails is ineffective. For ex: `assert isinstance(num_channels, int), ValueError('Number of image channels needs to be an integer')`

- If the condition would not be satisfied, user would be expecting a `ValueError`, but this would be raised: `AssertionError: Number of image channels needs to be an integer` which should be

- `yield` used inside a comprehension (which breaks code in Python 3.8)

- Write operation on file that is opened in read-only mode

- I/O detected on a closed file descriptor

> 2. Type checking? Use `mypy`: it just a single command!

Sure. If one prefers running it locally (or) as part of their CI. But if you already use DeepSource to flag issues, it can be enabled by a single line in .deepsource.toml file.

> 3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source

We are working on adding support for autopep8, black and autoflake in coming weeks. They mostly auto-patch stylistic issues [1]. Thanks for letting us know about pybetter. It looks like a great tool and fixes ~9 issues [2]. DeepSource’s autofix aim is to fix more than 3/4th of issues we detect and we detect 522 issues in our Python analyzer. We have dedicated engineering team actively working on the analyzers. As of today, following are some of the issues our Python analyzer can autofix (which I couldn’t find it among the open-source tools):

- No use of `self`

- Usafe of dangerous default argument

- Module imported but unused

- Function contains unused argument

- Debugger import detected

- Debugger activation detected

- Unnecessary comprehension

- Unnecessary literal

- Unnecessary call

- Unnecessary typecast

- Bad comparison test

- Empty module

- Built-in function `len` used as condition

- Unnecessary `fstring`

- `raise NotImplemented` should be `raise NotImplementedError`

- `assert` statement used outside of tests

Same goes with Go and other analyzers we support.

> I don't like this whole idea of such tools (both technically and ethically): > Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days.

We follow strict security practices [3]. In a gist, 1) We do not store your code, 2) Source code is pulled in an isolated environment that has no access to any of our internal systems or the external network, 3) As soon as the analysis is completed, the environment is destroyed and all logs are purged. Also, there are many tools that developers use everyday (Travis CI, Circle CI, GitHub) where the source code is sent to the cloud — I don't think it is accurate to call it a security breach. That said, we have on-premise setup of DeepSource in the roadmap. We’re working on SOC 2 Type 2 compliance as well [4].

> On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here

We have kept the tool completely free to use for open-source projects. We’ve also partnered with GitHub Education and made it free for students. We’re an early stage company trying to build a business in automating objective parts of code review and making it easier for every developer to adopt and use static analysis. With all transparency, we had plans to sponsor open-source projects but got sidetracked due to various reasons. We will be backing some of the open-source projects, in next couple of weeks.

[1] https://gist.githubusercontent.com/jaipradeesh/6ad8404fef253...

[2] https://gist.githubusercontent.com/jaipradeesh/b8a0e6b526f73...

[3] https://deepsource.io/security

[4] https://vanta.com/guides/vantas-guide-to-soc-2

dolftax · on March 15, 2020

DeepSource integrates with GitHub checks [1] and via the dashboard, you can select the issue types (anti-patterns, bug risks, performance and security issues, style, type checks and documentation), which when detected, will cause analysis runs to fail and pull requests to be blocked.

[1] https://pasteboard.co/IZfSThC.png [2] https://pasteboard.co/IZfT8uw.png

dolftax · on March 12, 2020

We'll tweet about it at https://twitter.com/deepsourcehq

dolftax · on March 12, 2020

There are two GitHub apps we maintain. One with read access (DeepSource) and one with write access (DeepSource Autofix).

By default, on signup, you would be installing the app with read access -- this enables us to pull source code from GitHub on every commit and pull-request, run analysis and report issues as GitHub checks. This is sufficient if you would like to use DeepSource only to flag issues.

With the release of Autofix -- when a fix is available for a flagged issue, DeepSource creates a pull request to the repository with the patch. For this, you would be asked to install the app with write access (DeepSource Autofix). Note that, DeepSource always creates a separate branch with the fixes and creates a pull request. We do not perform any write operations beyond the above mentioned scope.

dolftax · on March 12, 2020

Sure. I've left you an email.

dolftax · on March 11, 2020

We went ahead with integration with providers like GitHub and GitLab to have these checks in a central place as it is the easiest way for a team to adopt a tool like ours. Also, just having a local or IDE plugin doesn't ensure these issues never make it to trunk unless everyone in the team follows it strictly.

That said, for the convenience of developers, we're working on the ability to run the analysis and the fixes using our CLI. [1] This opens up doors to use the CLI and build IDE plugins in the near future.

[1] https://github.com/deepsourcelabs/cli/issues/15

dolftax · on March 11, 2020

> Why is this needed? Can’t you imply the necessary analyzers from my codebase?

Sure. We can probably infer the languages used in the repository. But we need metadata like test glob patterns, exclude patterns, runtime versions (Python 2, Python 3) to improve the accuracy of issues. For ex: Usage of assert statement in application logic is discouraged as it is removed when compiling to optimised byte code (python -o producing *.pyo files). Ideally, assert statement should be used only in tests. Also, we haven’t found a way to infer Python 2 vs Python 3 accurately. Can you think of a way? That would be helpful.

> There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or CSS support. Is there a roadmap?

We strongly believe in starting out with a few languages and add as many issues as we can (with the ability to autofix most of them) -- before we go broad. That said, we released Ruby in beta couple weeks back and are currently working on the stable release. We’ve also started working on JavaScript (with TypeScript support) a month back and we should release the beta version of JavaScript analyzer in approx a month from now.

> Also, the name implies use if Deep Networks and AI. Am I mistaken? If not, what kind of AI is used here? Seems like just an automatic runner of static analysis tools.

It’s just the name :) We do not use Machine Learning or AI at the moment — the reason being we’re optimizing for high accuracy, and a rules engine that uses AST parsing helps us do that reliably. We do plan to use learning in the future to capture data around which issues are being fixed the most and which are not, and then show issues in the most relevant order to users depending on their context.

dolftax · on Feb 18, 2020

We've been using API Tracker in production for few weeks now. The primary use case for us is to reliably handle webhooks from GitHub which our product relies heavily on (app installation, commit and pull request events).

Unfortunately, GitHub doesn't retry any failed webhooks and when our service goes down for a few seconds, thousands of webhooks fail and pile up. GitHub doesn't provide an API to query the failed webhooks and retry as well. We had to go through the painstaking task of visiting GitHub's app dashboard and click retry on each webhook, one by one.

With API tracker in place, we've updated our GitHub app's webhook delivery URL to send the webhooks to API tracker and they forward it to our services. In worst case when our service goes down for a while, API tracker gracefully retries all the failed webhooks.

Ref: https://github.community/t5/GitHub-API-Development-and/Handl...

thorgaardian · on Feb 18, 2020

Interesting use-case for it. Without prior knowledge of a solution like this I would have suggested you send the webhooks to a queue backed notification system (e.g. SNS backed by SQS) and subscribe to the event topic, but sounds must easier to configure and manage the way you instrumented it. Might be a good use-case for me to try out!

cameroncooper · on Feb 18, 2020

This is something you can easily configure with our automatic retry function. We have an option to return a pre-configured response to the caller, and put the request in a queue to be retried until successful. This allows you to have a sustained outage while making sure all calls are eventually delivered.

ignoramous · on Feb 18, 2020

> This allows you to have a sustained outage while making sure...

Re-driving queue backlogs at services recovering from sustained outages ends in tears almost always. Tread carefully. :)

jrockway · on Feb 19, 2020

Typically people use two pools for circuit breaking, with the limit set lower on retries: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...

capableweb · on Feb 18, 2020

Yeah, this is what I've seen most services who rely on webhooks from another service to do. Add in some monitoring of how many events are not yet processed (set a alarm when there is X amount of events in it) and you're done!

disposedtrolley · on Feb 18, 2020

We're currently building a GitHub integration which receives webhooks and kicks off a bunch of processing actions based on the event type. Your suggestion sounds like a great way to add some observability to the service -- thanks!

bpicolo · on Feb 18, 2020

> In worst case when our service goes down for a while

The worst case is still the same, no? API tracker goes down, GitHub has no redelivery, same deal. More a matter of whose uptime you trust more in this regard.

(That's not to say it's not valuable for this use case)

dolftax · on Feb 18, 2020

Sure. The least we expect from any service sending webhooks is built-in retry strategy. GitHub doesn't. We were thinking of building this ourselves internally but if someone takes care of this for you reliably, why not.

For API tracker, even if their services go down for a short while, it isn't good for business. Though it's been only few weeks using API tracker, we had zero failed webhook deliveries. They say they've designed their systems with this as a primary goal, of course. What if AWS or GCP goes down. It's a matter of trust and SLAs.

ignoramous · on Feb 18, 2020

> What if AWS or GCP goes down. It's a matter of trust and SLAs.

AWS does have a 100% uptime SLA on some of its services: Route53, for example [0]. Not saying that ApiTracker could not be a 100% uptime service (in fact, it looks like that's their explicit goal), just pointing out that AWS / GCP do have services that never "go down" barring global catastrophes.

[0] https://aws.amazon.com/blogs/architecture/a-case-study-in-gl... -- Route 53’s foremost goal is to always meet our promise of a 100% SLA for DNS queries – that all of our customers’ DNS names should resolve all the time.

ignoramous · on Feb 18, 2020

Thanks.

At $349 for 1M calls, doesn't it get expensive? I'd reckon, web-hooking it to Step Functions + AWS Lambda or SNS + SQS would have been a much cost effective solution at the cost of additional resources devoted to development and maintanence, of course. So, if you're comfortable sharing, what did the TCO economics look like for you when you decided to use ApiTracker instead?

the_arun · on Feb 19, 2020

Don't integrators like IFTTT already support GitHub integration? How is API Tracker different from IFTTT?